GENOMIC SELECTION Accuracy of Genomic Selection Methods in a Standard Data Set of Loblolly Pine (Pinus taeda L.)
نویسندگان
چکیده
Genomic selection can increase genetic gain per generation through early selection. Genomic selection is expected to be particularly valuable for traits that are costly to phenotype and expressed late in the life cycle of long-lived species. Alternative approaches to genomic selection prediction models may perform differently for traits with distinct genetic properties. Here the performance of four different original methods of genomic selection that differ with respect to assumptions regarding distribution of marker effects, including (i) ridge regression–best linear unbiased prediction (RR–BLUP), (ii) Bayes A, (iii) Bayes Cp, and (iv) Bayesian LASSO are presented. In addition, a modified RR–BLUP (RR–BLUP B) that utilizes a selected subset of markers was evaluated. The accuracy of these methods was compared across 17 traits with distinct heritabilities and genetic architectures, including growth, development, and disease-resistance properties, measured in a Pinus taeda (loblolly pine) training population of 951 individuals genotyped with 4853 SNPs. The predictive ability of the methods was evaluated using a 10-fold, cross-validation approach, and differed only marginally for most method/trait combinations. Interestingly, for fusiform rust disease-resistance traits, Bayes Cp, Bayes A, and RR–BLUB B had higher predictive ability than RR–BLUP and Bayesian LASSO. Fusiform rust is controlled by few genes of large effect. A limitation of RR–BLUP is the assumption of equal contribution of all markers to the observed variation. However, RR-BLUP B performed equally well as the Bayesian approaches.The genotypic and phenotypic data used in this study are publically available for comparative analysis of genomic selection prediction models. P LANT and animal breeders have effectively used phenotypic selection to increase the mean performance in selected populations. For many traits, phenotypic selection is costly and time consuming, especially so for traits expressed late in the life cycle of long-lived species. Genome-wide selection (GWS) (Meuwissen et al. 2001) was proposed as an approach to accelerating the breeding cycle. In GWS, trait-specific models predict phenotypes using dense molecular markers from a base population. These predictions are applied to genotypic information in subsequent generations to estimate direct genetic values (DGV). Several analytical approaches have been proposed for genome-based prediction of genetic values, and these differ with respect to assumptions about the marker effects (de los Campos et al. 2009a; Habier et al. 2011; Meuwissen et al. 2001). For example, ridge regression–best linear unbiased prediction (RR–BLUP) assumes that all marker effects are normally distributed and that these marker effects have identical variance (Meuwissen et al. 2001). In Bayes A, markers are assumed to have different variances and are modeled as following a scaled inverse x2 distribution (Meuwissen et al. 2001). The prior in Bayes B (Meuwissen et al. 2001) assumes the variance of markers to equal zero with probability p, and the complement with probability (1 – p) follows an inverse x2 distribution, with v degree of freedom and scale parameter S. The definition of the probability p depends on the genetic architecture of the trait, Copyright © 2012 by the Genetics Society of America doi: 10.1534/genetics.111.137026 Manuscript received November 19, 2011; accepted for publication January 10, 2012 Available freely online through the author-supported open access option. Supporting information is available online at http://www.genetics.org/content/ suppl/2012/01/23/genetics.111.137026.DC1. These authors contributed equally to this work. Correspondencing author: University of Florida, Newins-Ziegler Hall Rm. 367, University of Florida, Gainesville, FL 32611. E-mail: [email protected] Genetics, Vol. 190, 1503–1510 April 2012 1503 suggesting an improvement to the Bayes B model, known as Bayes Cp. In Bayes Cp, the mixture probability p has a prior uniform distribution (Habier et al. 2011). A drawback of Bayesian methods is the need for the definition of priors. The requirement of a prior for the parameter p is circumvented in the Bayesian LASSO method, which needs less information (de los Campos et al. 2009b; Legarra et al. 2011b). Methods for genomic prediction of genetic values may perform differently for different phenotypes (Meuwissen et al. 2001; Usai et al. 2009; Habier et al. 2011) and results may diverge because of differences in genetic architecture among traits (Hayes et al. 2009; Grattapaglia and Resende 2011). Therefore, it is valuable to compare performance among methods with real data and identify those that provide more accurate predictions. Recently, GWS was applied to agricultural crops (Crossa et al. 2010) and trees (Resende et al. 2011). Here we report, for an experimental breeding population of the tree species loblolly pine (Pinus taeda L.), a comparison of GWS predictive models for 17 traits with different heritabilities and predicted genetic architectures. Genome-wide selection models included RR–BLUP, Bayes A, Bayes Cp, and the Bayesian LASSO. In addition, we evaluated a modified RR–BLUPmethod that utilizes a subset of selected markers, RR–BLUP B. We show that, for most traits, there is limited difference among these four original methods in their ability to predict GBV. Bayes Cp performed better for fusiform rust resistance—a diseaseresistance trait shown previously to be controlled in part by major genes—and the proposed method RR–BLUP B was similar to or better than Bayes Cp when a subsample of markers was fitted to the model. Materials and Methods Training population and genotypic data The loblolly pine population used in this analysis is derived from 32 parents representing a wide range of accessions from the Atlantic coastal plain, Florida, and lower Gulf of the United States. Parents were crossed in a circular mating design with additional off-diagonal crosses, resulting in 70 full-sib families with an average of 13.5 individuals per family (Baltunis et al. 2007a). This population is referred to hereafter as CCLONES (comparing clonal lines on experimental sites). A subset of the CCLONES population, composed of 951 individuals from 61 families (mean, 15; standard deviation, 2.2) was genotyped using an Illumina Infinium assay (Illumina, San Diego, CA; Eckert et al. 2010) with 7216 SNP, each representing a unique pine EST contig. A subset of 4853 SNPs were polymorphic in this population and were used in this study. None of the markers were excluded on the basis of minimum allele frequency. Genotypic data and pedigree information are available in the Supporting Information, File S1 and File S2.
منابع مشابه
Accuracy of Genomic Selection Methods in a Standard Data Set of Loblolly Pine (Pinus taeda L.)
Genomic selection can increase genetic gain per generation through early selection. Genomic selection is expected to be particularly valuable for traits that are costly to phenotype and expressed late in the life cycle of long-lived species. Alternative approaches to genomic selection prediction models may perform differently for traits with distinct genetic properties. Here the performance of ...
متن کاملTitle: Accuracy of Genomic Selection Methods in a Standard Dataset of Loblolly Pine
Genomic selection can increase genetic gain per generation through early selection. Genomic selection is expected to be particularly valuable for traits that are costly to phenotype, and expressed late in the life-cycle of long-lived species. Alternative approaches to genomic selection prediction models may perform differently for traits with distinct genetic properties. Here the performance of...
متن کاملGenomic Estimated Breeding Values Using Genomic Relationship Matrices in a Cloned Population of Loblolly Pine
Replacement of the average numerator relationship matrix derived from the pedigree with the realized genomic relationship matrix based on DNA markers might be an attractive strategy in forest tree breeding for predictions of genetic merit. We used genotypes from 3461 single-nucleotide polymorphism loci to estimate genomic relationships for a population of 165 loblolly pine (Pinus taeda L.) indi...
متن کاملDetermining the best form factor formula for Loblolly Pine (Pinus taeda L.) plantations at the age of 18, in Guilan- northern Iran
In order to determine the best form factor formula for Loblolly Pine (Pinus taeda L.) plantations in Talesh (Western Guilan province-Iran), a number of 110 trees were selected based on their distribution in diameter classes, from 12 to 34 cm (in a two- cm diameter interval). First, several quantitative factors including diameter at breast height, diameter at 0.65 m of height, and diameter at st...
متن کاملA Consensus Genetic Map for Pinus taeda and Pinus elliottii and Extent of Linkage Disequilibrium in Two Genotype-Phenotype Discovery Populations of Pinus taeda
A consensus genetic map for Pinus taeda (loblolly pine) and Pinus elliottii (slash pine) was constructed by merging three previously published P. taeda maps with a map from a pseudo-backcross between P. elliottii and P. taeda. The consensus map positioned 3856 markers via genotyping of 1251 individuals from four pedigrees. It is the densest linkage map for a conifer to date. Average marker spac...
متن کامل